In recent years, sound classification has become a valuable tool in real-time monitoring systems, particularly for public safety and smart surveillance. One powerful model for sound classification is YAMNet, developed by Google. In this blog post, we will walk you through how we used YAMNet along with a custom-trained dataset to build an emergency sound detection system capable of identifying critical sounds like gunshots, glass breaks, and animal attacks.
What is YAMNet?
YAMNet (Yet Another Mobile Network) is a pre-trained deep learning model that classifies audio into a wide range of categories based on the AudioSet ontology by Google. It uses a MobileNet v1 architecture, which is lightweight yet efficient, making it suitable for real-time audio classification tasks.
Here’s how YAMNet works:
- It takes an input audio waveform (mono, 16 kHz sample rate).
- The model first converts the audio into log-mel spectrogram features.
- These features are passed through a convolutional neural network (CNN).
- Finally, the model outputs a list of predicted sound classes (from over 500 possible classes in AudioSet), each with a confidence score between 0 and 1.
Emergency Sound Detection Workflow
We built an emergency sound detection system that works in real-time and can identify high-risk sounds such as:
- Gunshots (e.g., pistol, rifle, shotgun)
- Glass breaking
- Animal roars (e.g., tiger, lion, dog)
Here’s the basic workflow we followed:
Dataset Preparation: Added a varied dataset of emergency sounds such as gunshot, glass break, howls, etc, from various sources.
Feature Extraction using YAMNet: We used YAMNet to extract embeddings (1024-dimensional feature vectors) from each audio file in the dataset. These embeddings represent the key characteristics of each sound.
Training a Custom Classifier: We trained a machine learning model (such as Random Forest or MLP) using the extracted embeddings to classify emergency sounds into specific categories.
Real-Time Detection: We implemented a live microphone input system that:
- Continuously captures audio
- Extracts embeddings using YAMNet
- Feed them into the trained classifier
- Displays alerts when an emergency sound is detected
Implementation Overview
After designing the workflow, we implemented the project using Python and TensorFlow. The implementation was divided into three main parts:
1. Audio Feature Extraction using YAMNet
We used the pre-trained YAMNet model to extract embeddings from my emergency audio dataset. These embeddings are high-level features that represent the sound in a compact form.
# Example: Extract embeddings from an audio file
import tensorflow as tf
import yamnet as yamnet_model
import soundfile as sf
import numpy as np
# Load the YAMNet model
yamnet = yamnet_model.yamnet_frames_model()
yamnet.load_weights('yamnet.h5')
# Load the audio
waveform, sr = sf.read('sample_gunshot.wav')
if sr != 16000:
# Resample if needed
pass
# Extract embeddingsscores, embeddings, spectrogram = yamnet(waveform)
2. Training the Custom Classifier
Once we had the embeddings, we trained a custom classifier using scikit-learn. This allowed me to fine-tune detection for only a few important classes, such as gunshot, glass_break, and dog_bark.
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
# Load your labeled embeddings
X = ... # YAMNet embeddings
y = ... # Corresponding labels
# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Train classifier
clf = RandomForestClassifier()
clf.fit(X_train, y_train)
3. Real-Time Detection System
For real-time detection, we created a script that listens to the microphone input, processes small audio chunks using YAMNet, and classifies them with the trained model. When it detects a critical sound, it shows an alert.
import sounddevice as sd
def callback(indata, frames, time, status):
# Preprocess live audio input
# Extract YAMNet embeddings
# Predict class with trained classifier
# Display alert if necessary
pass
# Start microphone stream
sd.InputStream(callback=callback).start()
Use Cases of Emergency Sound Detection
This system can be used in a variety of safety-focused environments:
- Detect breaking glass or animal sounds near the home.
- Public Surveillance: Detect gunshots or distress sounds in public areas.
- Wildlife Monitoring: Identify animal roars to track potential threats or endangered species.
- Industrial Safety: Detects accidents, crashes, or dangerous sound patterns in factories.
- School or Campus Safety: Gunshots or screams in hallways or outdoors, fights or aggressive sounds.
- Wildlife Monitoring & Safety: Use in: farms or forest-edge homes, detect animal roars or dog barks
Conclusion
By combining YAMNet’s powerful feature extraction with a custom-trained classifier tailored for emergency audio events, we’ve built a scalable and accurate detection system. This solution can be integrated into smart security systems, healthcare monitoring tools, and public safety infrastructure to improve response times and reduce risks.
The modular design allows it to adapt to different scenarios with minimal changes. As we continue to improve accuracy and reduce false positives, this system can become a reliable layer of safety for both consumers and enterprises.